-
Notifications
You must be signed in to change notification settings - Fork 789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use hashes for language service cache keys #6058
Use hashes for language service cache keys #6058
Conversation
In case folks are wondering about collisions with the let calcProb x space =
1.0 - ((space - 1.0)/space) ** (x * (x - 1.0) / 2.0)
let probs () =
let space = 2.0 ** 32.0
let probs =
[ for x in 1 .. 10 -> (10.0 ** float x) ]
|> List.map (fun powerOfTen -> bigint powerOfTen, calcProb powerOfTen space)
for (power, prob) in probs do
printfn "Num files: %A\nProbability of collision: %0.3f percent\n" power (prob * 100.0) Result:
Keep in mind that the hash code is not the only item used to forma a key; files must also share the same name to be considered a match. |
I'm not entirely sure I follow your hash collision probability test, but if the results are correct, we'd need some serious checking of the hash algorithm. Is this the same as string itself? Is this solution wide or per project? Though even with those numbers, it's still a big improvement over full text matching. |
Oh wait, you're hashing over the checksum, as opposed to the full text. And your numbers are about that, not about a possible underlying existing hash function. I spoke too soon. |
It's a birthday problem applied over the total possible hashes you can get with |
Re hash collisions - in this context we'd only be interested in two file contents with the same name and hash, which were a "small edit distance" away from each other. Which is exceptionally unlikely. |
Does this affect Right now I'm using |
Yes, this shouldn't have any observable effect aside (statistically speaking) aside from a reduction in memory usage. If you incur 10k or more changes to the same file in the same edit session (and those changes are spaced out long enough to end up getting cached) there's a 1.157% chance of a collision which would result in a re-parse, but this is very much in the realm of unlikely. |
* Use hashes for language service cache keys * Fix dumb thing I forgot * I guess this is required ayyy lmao * Override HashCode instead and do a few less computations * Remove ToString in obj expression
Fixes #6028 and is an evolved version of #5944
In #6001 we replaced strings with
ISourceText
in the language service, reducing huge amounts of allocations in VS. However, it did not address the issue that our language service caches still use the full text of source as keys. This PR continues what #5944 does, except it uses the same hashing algorithm that Roslyn uses for the actual RoslynSourceText
type.